Skip to content

fix(excel-html): chart overlays match Excel position and size#152

Open
dragonwhites wants to merge 3703 commits into
iOfficeAI:mainfrom
dragonwhites:fix/excel-html-chart-overlays
Open

fix(excel-html): chart overlays match Excel position and size#152
dragonwhites wants to merge 3703 commits into
iOfficeAI:mainfrom
dragonwhites:fix/excel-html-chart-overlays

Conversation

@dragonwhites

Copy link
Copy Markdown
Contributor

The HTML preview (view <file> html) renders charts/shapes/pictures as absolutely-positioned overlays on the sheet grid. Comparing the output to Excel surfaced three position/size issues, fixed here as three atomic commits:

1. Chart overlays don't fill their anchor box, and render ~a column too narrow.
The card was sized from its width alone (the inner SVG used height:auto), so the plot never grew into the height left below the title — leaving an empty gap under the chart. And the overlay box width summed whole spanned columns but dropped the partial-column EMU offset, while EstimateChartSize fell back to 48pt (64px) for default columns instead of the grid's ~44.27pt (59px) — together clamping the card ~one column narrower than Excel. The card is now a flex column (height:100%) that fills the box, and the box is sized from the chart's own EstimateChartSize (offset included, grid-aligned column metric).

2. Default/empty rows collapse below Excel's 15pt height.
Rows without an explicit height got no height attribute, so the grid let them shrink to content — empty rows ~17px and 11pt data rows ~22px, vs Excel's 15pt (~20px). That distorts the grid and drifts it out of step with the overlay anchor math. Every row now gets the explicit-or-default height; cell padding is trimmed (2px→1px) so default 11pt rows land on 15pt. Rows with an explicit/auto-fit height (rotated text, large fonts) keep their value.

3. Stale .chart-container margin offsets the overlay a row low.
.chart-container carried margin: 16px auto from an older in-flow layout. With charts now absolutely positioned, that top margin pushed each visible card down ~16px (to the bottom of its anchor row) and overflowed the box bottom. Set to margin: 0.

Validation. Rendered a multi-sheet workbook (charts, conditional formatting, sparklines, rich formatting, formulas, pivot) and compared to Excel. A 480px chart now renders ~444px ending mid-column-N (was clamped a column short); every gallery chart sits exactly on its anchor row (top + bottom); default grid rows are 20px (were 17/22px). No regressions across CF (color scales, data bars, icon sets), sparklines, rotation/number-formats/merges, formulas, or the pivot cross-tab.

The three commits are independent and can be split into separate PRs if you prefer one atomic fix per PR — happy to do that.

🤖 Generated with Claude Code

goworm added 30 commits June 12, 2026 17:37
chart title color emitted bare hex (color:FF0000) and the axis/legend/
gridline color self-sync re-emitted bare hex (fill="0000FF") — invalid
CSS/SVG, so browsers rendered them black. Route all through a CssHexColor
helper (idempotent for #/named values). Also honor the pptx title color
(was ignored, using theme tx1). Shared renderer: xlsx/pptx/docx all fixed.
a line shape collapsed to one dimension (height=0 or width=0) rendered
invisible: the solid path drew a degenerate strip and the SVG-dash path
computed a negative rect and vanished. Route a non-line-preset outlined,
text-less shape whose box collapses through RenderConnector, which draws
zero-dimension lines with the correct color/width/dash.
writing a value to a table header cell produced two corruptions that
made the workbook unopenable in Excel (our lenient reader masked both):
1. the cell kept a stale <is> inline-string placeholder alongside the
   new <v> (dual value children) — clear InlineString on every value/
   formula/clear write path.
2. the table column name (Column1/Column2 auto-placeholder) no longer
   matched the overwritten header text — re-sync <tableColumn name> to
   the header cell on writes within a table's header row.
Verified: real Excel now opens the previously-rejected file.
the sparkline SVG stroke/fill was hardcoded to #4472C4, ignoring the
stored series color (x14 colorSeries). Read group.SeriesColor and use it
for the line stroke and column-bar fill, falling back to the default
blue only when none is stored.
margin-top was emitted as spaceBefore minus the previous paragraph's
spaceAfter (a stale 'flexbox doesn't collapse' assumption). Paragraphs
are block-flow siblings whose margins collapse to the max, so the
subtraction shrank every gap — and for small spaceBefore it dropped to
0, ignoring the spacing entirely. Emit the full spaceBefore; CSS margin
collapse then yields max(prevAfter, thisBefore), matching Word.
bar/column/stacked/area/pie/doughnut fills fell back to opacity=0.85
when a series declared no explicit a:alpha, washing every default chart
~15% lighter than native Office (which renders opaque). Default the
FillOpacity fallback to 1.0; an explicit a:alpha still overrides. Shared
renderer — xlsx/pptx/docx all fixed.
a w:br type=page emitted the page-transition </div></div> markup while
the run <span> and paragraph <p> were still open, producing invalid
nesting (browsers auto-recovered). Close </span></p> before the break
marker and reopen an identical <p> for any remaining runs, mirroring the
column-break path. Nesting validator now clean; break still paginates.
cellIs equal/notEqual rules against a text operand never applied in the
preview — the evaluator gated on a numeric cell value and compared
numerically only. Now equal/notEqual fall back to a case-insensitive
string compare (stripping the quotes from the "..." formula literal)
when either side is non-numeric; numeric rules unchanged.
…ed by Office

an explicit color on a line/scatter/radar series was written to a bare
<c:spPr><a:solidFill> — which real PowerPoint ignores for the line
stroke (it uses the theme color), while the HTML preview read it and
showed the requested color. The preview thus lied. Write line-based
series colors into <a:ln><a:solidFill> so Office honors them; reader
prefers a:ln then falls back to bare fill. Bar/column/pie/area keep the
bare area fill. Verified in real PowerPoint: line now renders red.
a General/unformatted numeric cell holding an IEEE-754-noisy value (e.g.
99.98999999999999 from computed/imported/round-tripped data) showed the
raw double; Excel rounds General to ~15 significant digits (99.99).
FormatGeneralNumber's normal-magnitude branch now uses G15. Scientific
branch and explicitly-number-formatted cells unchanged.
filled radar (radarStyle=filled) series polygons rendered at
fill-opacity=0.2 — nearly invisible vs PowerPoint's vibrant ~0.7 fill.
Raised to 0.7, scoped to the radar renderer; line-chart area-under
fills untouched. Verified against native PowerPoint.
font-family always terminated in sans-serif, so Courier New/Consolas
fell back to a proportional font when unavailable, breaking monospace
alignment. Pick the generic family by font name: monospace fonts ->
monospace, serif fonts -> serif, else sans-serif.
non-stacked area series were emitted in reverse index order, so the
wrong series sat on top vs PowerPoint (which paints higher-idx on top).
Paint series0 first (bottom) ... seriesN last (top). Verified against
real PowerPoint: same series now on top in both.
a chart series/categories given as a cell range (series1=B1:B4) emitted
a numRef/strRef with no cached values, so the HTML preview plotted
nothing (the dataRange= path already cached). Backfill the cache from
the referenced cells, mirroring dataRange=. Verified: HTML renders the
bars and real Excel opens+shows the chart.
table cell content uses 1-based keys (r1c1, matching Word); 0-based or
mistyped keys (r0c0) were silently dropped because the border fan-out
used a .Where() over TrackingPropertyDictionary, marking every key read
and wiping the unsupported signal. Scan via Keys+TryGetValue so unread
keys surface as unsupported_property, and hint 'did you mean r1c1
(cell keys are 1-based)'.
Sheets that declare millions of value-less, formula-less, style-less
empty cells balloon the SDK DOM to GBs. WorksheetBloatFilter strips
them before the SDK parses the part (lossless — Excel and LibreOffice
discard such cells on load too). Gated: normal files take the original
direct-stream path untouched. Filtered sessions operate on a slimmed
in-memory copy and write back to the backing file on save/close so
mid-session snapshots and final closes still hit disk.
…g; share smooth/trendline/error-bar primitives with the line renderer

Scatter charts had two overlapping implementations: a dedicated
RenderScatterChartSvg (per-series xVal/yVal value axes) and a parallel
path that threaded scatter X through RenderLineChartSvg. The line path was
already shadowed by the scatter-PlotArea dispatch branch, so it was dead
code, and keeping both invited drift.

Consolidate onto RenderScatterChartSvg and delete the line-renderer scatter
path (the scatterX parameter, its value-axis block, and the dispatch-side
category parsing). Scatter keeps its correct per-series data model.

To avoid duplicating decorations, extract three primitives from
RenderLineChartSvg and call them from both renderers:
- BuildSmoothPath: Catmull-Rom -> cubic Bezier path
- AppendErrorBars: vertical (Y) error bars at each point
- AppendTrendline: regression line; takes the X data + an X-value->pixel
  mapper, so the caller owns axis positioning

Because the scatter renderer feeds AppendTrendline the real X values (xVal),
a scatter trendline now fits over the true X domain. The previous trendline
code always regressed over the 1-based category index, which produced a
wrong slope/equation for non-uniformly-spaced scatter X (e.g. X=10,20,40,
80,160 -> correct slope 0.3649, index-based slope 13.1). Line/category
charts still pass the index as X, so their output is unchanged (verified
byte-identical).
A column of chained formulas (B1=A1, Bi=B{i-1}+Ai) silently produced a
wrong value once the chain exceeded 256 links: B270 over A1..A270=1..270
returned 36480 instead of 36585, exact through link 256 and drifting
after.

Root cause: _parseDepth, the per-formula parenthesis-nesting guard
(cap 256, a DoS backstop), was never reset when ResolveCellResult
re-entered EvaluateFormula to evaluate a referenced cell. Each chain
link leaked one frame into the counter, so at link 257 the parser bailed
mid-expression and the evaluator fell back to a blank cached value -
a silent zero folded into every downstream sum.

Fix: save and zero _parseDepth around the nested EvaluateFormula call
and restore it in finally, so the cap counts only the current formula's
own nesting. Cross-cell recursion depth stays guarded by the existing
TryEnsureSufficientExecutionStack probe and the MaxSameSheetDepth=1000
backstop, which surface a visible #NUM! instead of truncating silently.
… HTML preview

A containsBlanks rule over D1:D5 with only D1 populated never showed its
fill: the HTML grid sized itself to the used data extent (one row), so
the blank in-range cells D2:D5 were not rendered at all and the CF map
was clamped to the same bounds.

Fix: merge raw conditional-formatting sqref extents (CfRangeExtents)
into both the grid dimensions and the CF evaluation bounds, clamped
through the existing row/column caps so whole-column references stay
bounded. containsBlanks evaluation already treats missing cells as
blank, so the synthesized cells pick up the fill.
\begin{aligned}a=1 \\ b=2\end{aligned} produced an OMML matrix (m:m)
instead of an equation array (m:eqArr), and the dump side then
serialized that matrix back as \begin{matrix}, so the equation lost its
alignment semantics in Word and the round-trip drifted on first pass.

Root cause: the align-family environments (align, aligned, gathered,
split) were routed through ParseMatrix alongside the true matrix
environments. The OMML-to-LaTeX direction already handled eqArr; it
just never fired because parse never produced one.

Fix: after ParseMatrix, convert each matrix row into one m:e of an
m:eqArr for align-family environments. pmatrix/bmatrix/matrix/cases
keep producing m:m. Round-trip is now stable: aligned -> eqArr ->
aligned.
…ional points-to-EMU multiply

Setting axis/border line specs (valAxisLine=FF0000:12700:dash) with an
EMU-scale width wrote a:ln/@w = 161290000 - the width slot was always
multiplied by 12700 (points to EMU), so values copied from real OOXML
overflowed the ST_LineWidth maximum of 20116800 and the document failed
schema validation. The dump side emits widths verbatim, so dump->batch
round-trips faithfully replayed the invalid width.

Fix: shared TryParseLineWidthEmu. Bare decimals keep their documented
meaning of points; unit-qualified values (1pt, 0.5mm, 12700emu) go
through EmuConverter.ParseEmu; bare integers above the legal point
ceiling are treated as raw EMU per the ParseEmu convention; the result
is clamped to the schema range so Set can never emit an invalid width.
Applied to every colon-spec and dotted .width mutator that shared the
same multiply: series.outline, gridline specs, valAxisLine/catAxisLine,
plotArea.border, chartArea.border, chart lineWidth, seriesN.lineWidth.
…adback

Adding a shape with underline=true or strikethrough=single and no text
silently lost both properties: Get returned a Format without the keys,
so dump->replay dropped them.

Root cause: the write side already stored them on endParaRPr (the same
RunPropTargets fallback bold/italic use for runless shapes), but the
runless-shape reader surfaced only font/size/bold/italic/caps/color
from endParaRPr - underline and strike were never read back.

Fix: mirror the run-present underline/strike readers into the
endParaRPr branch (sng->single, dbl->double; SingleStrike/DoubleStrike/
NoStrike mapping), so the keys round-trip for shapes that have no text
yet.
…eview

highlight= was rejected as UNSUPPORTED on the Add path, had no curated
Set case, and never surfaced on Get, so a text highlight could not be
authored and view html emitted no background-color. The one existing
writer (find/replace formatting) built the color via BuildSolidFillColor,
which wrote 8-digit AARRGGBB into a:srgbClr/@Val - invalid ST_HexColorRGB
that renderers fell back to white.

Fix:
- Set: curated highlight case writes <a:highlight> through
  BuildColorElement, positioned by ReorderDrawingRunProperties so the
  rPr child order stays schema-valid (PowerPoint silently ignores
  out-of-order children).
- Add: highlight joins the shape effectKeys and the AddRun branch
  (fill-before-latin slot), accepted instead of UNSUPPORTED.
- Get: new ReadColorFromHighlight readback at run and shape level,
  canonical #-prefixed uppercase hex.
- find/replace path repointed to BuildColorElement, fixing the invalid
  8-digit srgbClr val.
- schemas/help/pptx/{run,shape}.json declare the property.

The HTML preview already mapped a:highlight to background-color; it now
receives real data.
Adding a field (or run) with color=accent1 writes
<w:color w:val="auto" w:themeColor="accent1"/>, but Get returned the
raw compound "auto;themeColor=accent1" instead of "accent1", breaking
the canonical rule that scheme colors pass through unchanged.

val="auto" carries no color information - Word resolves the run color
from the theme slot - so the compound head is pure noise for the
pure-theme form. StyleColorWithThemeTail now collapses
auto + themeColor (no shade/tint) to the bare scheme name. An explicit
hex val alongside themeColor keeps the full "HEX;themeColor=..." tail,
and any themeShade/themeTint modifier keeps the compound form too -
those carry information the bare name would lose.
…hema-valid OOXML

Two table rebuild defects produced documents the OpenXML schema
validator rejects:

1. add table with rowBandSize/colBandSize wrote w:tblStyleRowBandSize /
   w:tblStyleColBandSize as direct tblPr children. The SDK's CT_TblPr
   particle for a table instance has no slot for them (they belong to a
   table STYLE's tblPr), so the rebuilt document failed validation with
   'invalid child element tblStyleRowBandSize'. They are now emitted
   inside an mc:AlternateContent/mc:Choice Requires="w" guard: Word
   processes the choice transparently per MCE rules (identical runtime
   behavior), while strict schema validation passes. The band-size
   reader unwraps one level of AlternateContent/Choice so readback and
   dump still surface rowBandSize/colBandSize - scoped to direct
   children, a tblPrChange snapshot does not leak its prior band sizes
   onto the live table.

2. Replaying a tracked-change cell merge appended w:tcPr AFTER the
   cell's w:p; CT_Tc requires tcPr as the first child, so validation
   failed with 'unexpected child element tcPr'. The replay path now
   inserts tcPr at the schema position.

Both round-trips validate clean with attributes preserved.
…o the paragraph mark

A paragraph whose content is a field chain with a formatted cached
result (e.g. a REF field whose first result run is bold) rebuilt with
the formatting duplicated: <w:b/> appeared both on the result run and
in <w:pPr><w:rPr> - the paragraph mark - so the rebuilt XML carried two
bolds where the source had one.

Root cause: Get surfaces first-run formatting on the paragraph node
(firstRun-fallback), and the emitter strips those harvested keys from
the paragraph props only when a run-typed child exists. A field chain
swallows all its text runs in CollapseFieldChains, leaving no run-typed
children - so the strip never fired, the harvested bold rode the
'add p' op, and the field emit (verbatim raw-set / add field) replayed
the same formatting a second time on the result run.

Fix: field entries count as format-bearing hoist sources, so the
paragraph-level strip fires for field-chain paragraphs too. The
paragraph mark keeps only genuine pPr/rPr formatting.
…uding single-run paragraphs

Inline <w:customXml> wrappers flatten to their inner runs on dump->batch
(structure is not replayable), but unlike the parallel smartTag flatten
the loss was silent: no warning reached the dump envelope or view
issues, so a consumer had no machine-readable signal that wrapper
semantics were dropped.

Two gaps:
1. A run-level <w:customXml> parses as a typed CustomXmlRun (smartTag
   parses as an unknown element and already took the marking path), so
   its inner runs were never stamped _wrapperFlattened.
2. The single-run collapse path folds the wrapped run into the
   paragraph's own text prop and bypassed EmitPlainOrHyperlinkRun,
   skipping the warning emission even for marked runs.

Fix: stamp _wrapperFlattened for runs under a CustomXmlRun ancestor,
and extract the warning emission into a shared WarnWrapperFlattened
used by both the collapse path and EmitPlainOrHyperlinkRun. Text
content round-trips as before; the flatten is now always announced.
Injecting <w:r><w:t xml:space="preserve"> </w:t></w:r> via raw-set
saved an empty <w:t/> - the space was destroyed at write time, before
any round-trip. Visible end-to-end as nested smartTags losing their
interstitial space-only run ('John Smith' rebuilt as 'JohnSmith').

Root cause: ParseFragment parsed fragments with XDocument
LoadOptions.None, which discards whitespace-only text nodes wholesale -
correct for formatting whitespace between elements, wrong for leaf
content.

Fix: parse fragments whitespace-preserved, then normalize: whitespace
text nodes whose parent has element children (formatting indentation)
are removed as before; whitespace that is a leaf element's entire
content is kept and stamped xml:space="preserve" so the subsequent SDK
InnerXml parse and later document reopens keep it too. Applies to
raw-set on all three formats; the change is additive (previously-dropped
content is now preserved), with no effect on element structure.
…e part

A docx whose word/theme/theme1.xml exists but is empty (or a bare
<a:theme/> without themeElements) dumped as a raw-set remove of the
theme part - the rebuilt document had NO theme at all. The input was
already broken (a theme without themeElements is schema-invalid and
Word refuses to open it), and the round-trip preserved the brokenness
in a different shape instead of healing it.

Root cause: EmitThemeRaw distinguishes 'theme part absent' from
'theme part present but degenerate' via the typed .Theme accessor,
which is null for both, so the present-but-degenerate case fell into
the remove branch meant for genuinely theme-less documents.

Fix: probe the package part URIs for /word/theme/ - a present-but-
degenerate part falls through to BlankDocCreator.BuildDefaultTheme, the
same schema-complete default theme a blank document gets, so the
rebuilt file opens in Word. A genuinely absent theme still emits the
remove.
goworm and others added 27 commits June 15, 2026 22:26
… a preceding merged cell

Column operations address cells by their ordinal position in the row
(cells[colIdx-1]). That equals the target grid column only when no earlier
cell in the row spans horizontally; a gridSpan before the target column
shifts the ordinal, so the operation silently acted on the WRONG cell —
e.g. removing column 3 in a row whose first cell spans columns 1-2 deleted
the column-4 cell and kept column 3. The merge guard inspected the same
ordinal cell, so a preceding span both evaded the guard and misdirected the
op (silent data corruption). The track-change column delete and the
add-column boundary check had no slot-aware guard at all.

Replace the ordinal merge check with a slot-aware guard that walks each row
accumulating gridSpan: it rejects when the target grid slot is itself merged
(gridSpan/vMerge) or when a preceding horizontal span makes the ordinal
differ from the slot. Wired into remove/move/copy/add-column and the
track-change column delete. This honors the existing "unmerge before
column-level operations" contract, turning silent corruption into a clear,
actionable error; clean (unmerged) tables are unaffected.
…) round-trips instead of dangling

A presentation extLst can reference a custom binary part by relationship id —
Google Slides exports <go:slidesCustomData r:id="rIdN"> pointing at ppt/metadata
(rel type .../presentationmetadata, content type application/binary).
EmitPresentationExtras replays that extLst verbatim via raw-set, carrying the
r:id, but nothing re-created the relationship or the target part: the rebuilt
presentation.xml then carried a dangling rId and PowerPoint refused the deck.

Surface presentation-attached ExtendedParts via GetPresentationExtendedParts and
emit an add-part extpart row on /presentation that pins the source rId, the
custom rel type, content type and bytes. Extend the extpart add-part host set to
accept /presentation (alongside slide/layout/master). The part re-homes to the
SDK's ExtendedPart location but resolves by r:id, so the reference binds and the
bytes round-trip verbatim.
… of rejecting the picture add

A TIFF image part can carry the content type image/tif as well as the canonical
image/tiff. The MIME validation only recognised image/tiff, so a deck with an
image/tif picture aborted its add step with 'Unsupported MIME type: image/tif'.

Accept image/tif alongside image/tiff (mirroring the existing image/jpeg /
image/jpg alias), and add the same alias to the four thumbnail / image
content-type maps in the handler so a tif part is not silently relabelled png.
… dangling

A SmartArt diagram node can carry an external hyperlink (<a:hlinkClick r:id>)
whose relationship lives on the diagram data part's (or the DSP cached-drawing
part's) OWN .rels. add-part smartart recreated both parts empty and re-attached
only the embedded ImageParts, so the hyperlink relationship was dropped: the
replayed data/drawing XML kept the verbatim r:id but its .rels no longer
declared it, leaving a dangling relationship that PowerPoint refused
(0x80070570).

Carry each diagram part's external hyperlink relationships (rId + target) on the
SmartArtInfo, emit them as numbered dataHlink/drawingHlink props, and re-add them
via AddHyperlinkRelationship with the pinned source rId in the add-part smartart
handler. Mirrors the existing embedded-image carrier (dataImage/drawingImage).
Integrates the pptx dump→batch round-trip campaign branch (73 fix(pptx)
commits) covering carrier round-trips (images, themes, tags, extended/custom
parts, external links, slide-jump links, SmartArt images+hyperlinks,
presentation-level Google metadata), scheme/system/pattern colors, signed
color-transform offsets, connector variants, empty charts, negative insets,
group-child id stability, nested group set routing, and table cell txBodyRaw
robustness.
…round-trip

A table cell shaped [nested table, display equation] lost the equation on
dump→batch. The equation paragraph was emitted as `add equation` targeting
p[cellParaIdx] (= p[1]), but at replay p[1] is the cell's leading outer-seed
paragraph; the nested-table lead-cleanup then issues `remove p[1]`, deleting
that paragraph together with the equation just placed in it. The empty
paragraph the SDK seeds AFTER the nested table — the one a plain paragraph
correctly reuses via set p[last()] — was left untouched.

Mirror the plain-paragraph trailing-auto-p handling for equations: treat an
equation immediately after a nested table as the trailing auto-present
paragraph, and target p[last()] (the seeded post-table paragraph that
survives the remove) instead of p[cellParaIdx]. Equation-only cells and
equations after a text paragraph are unchanged.
… of dangling

A slideMaster/slideLayout can host an <p:oleObj r:id="rIdN"> (e.g. an embedded
clip-art OLE object) whose part is an EmbeddedObjectPart / EmbeddedPackagePart
on the master's own .rels. The master XML is replayed verbatim via raw-set
(keeping r:id="rIdN"), but the extended-part carrier only re-created
ExtendedPart blobs, so the OLE part + its relationship were dropped: the rebuilt
master's r:id dangled and PowerPoint refused the deck (0x80070570).

Broaden ReadExtendedPartInfos to also surface EmbeddedObjectPart and
EmbeddedPackagePart (alongside ExtendedPart), carrying each part's relationship
type, content type and bytes. They flow through the existing master/layout
add-part extpart carrier, which re-pins the source rId; the OLE relationship is
recreated with its .../oleObject type and the bytes round-trip verbatim.
ImageParts stay excluded (carried separately by GetMasterImageParts).
…ad of being dropped

A cell holding two block content controls — e.g. [plain SDT, rich SDT] —
lost the second one on dump→batch. EmitCellSdt chooses between inserting the
SDT before the cell's auto-seed paragraph (leading content) and appending it
(non-leading) from a `cellHasContent` flag, which was fed `firstParaSeen` —
a signal that only tracks PARAGRAPHS. A preceding SDT (or nested table)
leaves firstParaSeen false, so the second SDT still took the
insert-before-seed path; but the first SDT had already consumed or displaced
that seed paragraph, so the raw-set targeted a paragraph that no longer
existed and the control was dropped.

Track whether ANY cell content (paragraph, nested table, or SDT) has been
emitted and feed that to EmitCellSdt, so only a genuinely leading SDT
inserts before the seed and every later one appends. The seed-consumed flag
that drives a following paragraph's fresh `add p` keys off the same signal.
Sole SDT, leading SDT, SDT-then-paragraph and paragraph-then-SDT cells are
unchanged.
Brings the round-36 fix onto main: a slideMaster/slideLayout <p:oleObj r:id>
(embedded OLE object / package) is now re-pinned via the extended-part carrier
so the relationship resolves instead of dangling on round-trip.
…ips instead of flattening to text

A content control inside a table cell whose rich content referenced an
external relationship (a hyperlink, or an embedded image) was flattened to
plain text on dump→batch — the link, run formatting and multi-paragraph
structure were lost. The cell SDT emitter bailed straight to the text emit
on any external rel, while the body-level emitter already ships such an SDT
through the inlined-parts carrier (verbatim sdtXml + part/ext data with rel
ids rewritten on replay). The cell path simply lacked that carrier.

Mirror the body path in EmitCellSdt: when the rich cell SDT carries an
external rel, try the GetSdtEmitData carrier and emit `add sdt sdtXml=…`;
only fall back to the text flatten when a referenced part can't be resolved.
Because the carrier's `add sdt` appends after the cell's auto-seed paragraph,
drop that now-leading seed when the control is the cell's leading content so
the rebuilt cell matches the source shape; non-leading controls append after
existing content and need no cleanup. Header/footer hosts (no auto-seed) skip
the seed removal.
…al so following tables round-trip in place

A body-level rich content control whose content includes one or more
tables is shipped verbatim by the dump carrier (sdtXml). The carrier
emits the SDT — including its inner <w:tbl> — without routing through
EmitTable, so ctx.TableOrdinalBox is never advanced for the shipped
tables. At replay those tables still exist in document order and count
toward the `(//w:tbl)[N]` XPath that later cell-SDT / tblGrid raw-sets
resolve against, so every table that follows the carrier had its
selector land one (or more) tables early. The result: a sibling table's
cell content control wrapped the wrong cell, and the spurious nested SDT
that produced was dropped on the next SDK re-save, taking that cell's
drawing with it.

Advance TableOrdinalBox by the number of <w:tbl> opens in the shipped
sdtXml right after the carrier is emitted, keeping the emitter's
`(//w:tbl)` numbering in lockstep with replay. Tables following such a
carrier now round-trip in place with their cell content controls and
drawings intact.
…alformed num val=auto

A dataBar conditional-formatting rule documents its min/max bounds as "numeric
or 'auto'", where 'auto' requests automatic bounds. The add path treated any
non-null bound as a literal numeric value, so min=auto/max=auto serialized as
<cfvo type="num" val="auto"/> (and x14 <cfvo type="num"><f>auto</f>). That is
malformed — a num-typed cfvo requires a numeric val — so Excel silently dropped
the entire data bar on open (no bars rendered, though the file still validated
against the lax schema).

Normalize the 'auto' sentinel (case-insensitive) back to null before building
the cfvo elements, so both the 2007 dataBar and the 2010+ x14 counterpart take
their automatic-bound branches (type=min/max, autoMin/autoMax) — identical to
omitting the bound. Explicit numeric bounds are unaffected.
… xlsx CF rule family

The excel examples covered cell formatting, charts, and pivot tables but not
conditional formatting — the conditionalformatting element and its ~30 rule
types had zero coverage. Add a 7-sheet showcase, one rule family per sheet:
cellIs comparison, text matching, top/bottom/average, data bars, colour scales,
icon sets, and formula/date/duplicate/unique rules.

The build script drives the officecli Python SDK (resident pipe + batched
writes) rather than per-command subprocess calls, demonstrating the SDK as the
intended consumer for many-write workbooks. It falls back to the in-repo SDK
copy when officecli-sdk is not pip-installed. Validation runs against the saved
file from a fresh process, since CF differential fills live in the workbook-level
dxfs table.
… instead of dropping the row

A <w:sdt> (SdtRow) that is a direct child of <w:tbl> and whose sdtContent
wraps an entire <w:tr> — Word's locked-row shape, used by forms to make a
whole row read-only — was silently dropped on dump→batch round-trip: the
row, its cells, text and <w:lock> all vanished and the table rebuilt short.

Root cause: navigation enumerated rows with table.Elements<TableRow>(),
which sees only direct children, so a row nested inside an SdtRow wrapper
was invisible to Get / Query / dump and never emitted. CT_Tbl's content
model permits an SDT around one or more rows, so this is valid input.

Add GetTableRowsFlattened (mirroring the existing GetRowCellsFlattened
cell-flatten contract) and route the table row enumeration in
TableToNode / the tr navigation axis / the row walk through it, so wrapped
rows are counted and their cells/text round-trip via the typed emit. After
the typed row/cell content is applied, EmitTable patches each single-row
wrapper back to its verbatim <w:sdt> via raw-set replace in descending row
order (replacing w:tr[N] with the sdt removes it from the w:tr axis), which
restores the wrapper and its lock. Tables with row-level content controls
now round-trip with every row, its content and its lock intact.
…flushed style edits

ValidateDocument validates a throwaway Clone of the package to avoid touching
the live document. But Clone(stream) reads every live part's stream, which
re-introduces the very desync the clone was meant to avoid: cloning a package
that has a loaded-but-unflushed StylesPart (e.g. one just created by a fill/font
edit) desyncs the SDK's dirty-tracking for that part, and it then serializes
EMPTY on the caller's next Save. The visible failure was "edit a cell style ->
validate in-session -> save" producing a 0-byte styles.xml ("Root element is
missing") and, for conditional-formatting rules, a cascade of dangling-dxfId
errors — a file the spreadsheet app reports as corrupt.

Flush each already-loaded part's DOM back to its stream before the clone, so
both Clone and the preflight read in-sync bytes and cannot desync. Only loaded
parts are flushed: an unloaded part cannot be dirty, and force-loading one would
make the caller's Save re-serialize an untouched part — validate must stay
read-only. DocumentFormat.OpenXml 3.x exposes no public IsRootElementLoaded, so
the loaded-root state is read from the private field reflectively, with the
whole pass best-effort (any per-part hiccup is swallowed; a renamed field
degrades to a no-op).
…t property surface

The word examples covered run, paragraph, table, and numbering formatting but
not the document-level surface — the `document` container's 67 settable
properties had no example. Add a showcase covering all seven groups: core +
extended metadata, page setup (size/orientation/margins/mirror/book-fold),
docDefaults (the run/paragraph defaults unstyled text inherits), theme palette
and major/minor fonts, CJK grid and spacing controls, font embedding, and
display/print/privacy flags.

Built on the officecli Python SDK (resident pipe + batched writes), with the
in-repo SDK fallback. Body paragraphs are intentionally unstyled so the
docDefaults font/size/colour inheritance is visible in the rendered document.
…rlink/image relationship on that part, not the main document

A rich content control (SDT) carrying an external hyperlink — or an image —
that sits directly at the root of a header or footer round-tripped with its
relationship registered on word/_rels/document.xml.rels instead of the
header/footer part's own rels. The r:id kept verbatim in word/footerN.xml
then pointed at a relationship that part did not have, so Word treated the
whole document as corrupt and refused to open it.

Root cause: ResolveImageHostPart walked run.Ancestors<Header>() /
run.Ancestors<Footer>() to find the host part. Ancestors excludes self, and
the SDT carrier in AddSdt passes the Footer/Header element ITSELF as the
parent when an `add sdt parent=/footer[N]` lands the control at the part
root — so the lookup found no header/footer ancestor and fell through to
MainDocumentPart. The relationship was created on the main document while
the dangling r:id stayed in the footer.

Use a self-or-ancestor walk (run as Footer ?? run.Ancestors<Footer>(), and
likewise for Header), mirroring the BUG-R14A fix already in ResolveHostPart.
Header/footer-root content controls that carry hyperlinks or images now
register their relationships on the correct part and the file opens.
…s instead of being dropped

The "forced page break, then a new section" idiom — a paragraph whose pPr
holds a <w:sectPr> and whose body is a single <w:r><w:br w:type="page"/></w:r>
— lost its page break on dump→batch. The section-carrier paragraph emits its
runs through a dedicated filter that admitted text, tab, bookmark, SDT and
drawing runs but not a pure break run (which surfaces as a child of type
"break" with empty text). The break fell through the run/r/picture gate and
was dropped, so the forced page break collapsed and every page after the
section boundary reflowed.

Admit type=="break" children in the carrier filter and emit them through
TryEmitBreakRun — the same helper the main paragraph run loop uses — so a
page / column / line break carried on a section-break paragraph survives.
…atting, not just color/underline/font/size/bold/italic

AddHyperlink built the wrapped run's rPr by hand and only covered
color, underline, the font slots, size, bold and italic. Every other
character property the source set on the link run was silently dropped on
dump→batch — most consequentially <w:vanish/>, so a hidden hyperlink (the
boilerplate / template-guidance links Word documents bury in vanish text)
came back as visible text and shifted the surrounding layout. The
per-script bold.cs / italic.cs / size.cs, the run languages, caps,
smallCaps, strike, vertAlign, position and friends were lost the same way.

Route the remaining character keys through the shared ApplyRunFormatting
helper — the same applier the plain-run path uses — after the special
color/underline/size/bold/italic/font handling. None of the added keys
collide with those slots, so a hyperlink run now preserves its vanish,
complex-script weights/size, language and decoration through the
round-trip.
…get works

A blank officecli xlsx was the only format that did not stamp a theme part —
docx and pptx blanks both do, and every real Excel workbook ships one. Without
it, setting workbook theme properties (theme.color.accentN, theme.font.major/
minor) silently no-opped: ThemeHandler had no ThemePart to write into, so the
keys were reported unsupported and theme-colour lookups came back empty, even
though the schema declares the surface settable.

Stamp the shared default theme into the WorkbookPart at create time, matching
the docx/pptx blanks. workbook theme.* now resolves and round-trips on a freshly
created file, closing the cross-format parity gap.
…rtion

The DrawingML effectLst child-ordering insert (blur → fillOverlay → glow →
innerShdw → outerShdw → prstShdw → reflection → softEdge) was implemented three
times: a table-driven copy in Core/DrawingEffectsHelper (run-level rPr), a
byte-identical table-driven copy in PowerPointHandler.Effects (shape-level
spPr), and a hand-written switch variant in ExcelHandler.Helpers.Drawing. The
Core copy's comment even documented the PPT array as a manually kept-in-sync
mirror.

Promote DrawingEffectsHelper.InsertEffectInSchemaOrder to internal and route
all shape-level callers through it; delete the PPT array + InsertEffectInOrder
method (13 call sites repointed) and the Excel switch variant (1 call site).
The three implementations were behavior-equivalent — same schema order, same
empty-list and unknown-type fallback to AppendChild — so this is a pure
de-duplication with no runtime change. Effect/schema-order suite (333 tests)
green.
…longer gains a spurious checkbox glyph

AddFormField unconditionally wrote a ☐ / ☒ glyph as the FORMCHECKBOX
field result. Word renders the box from the ffData <w:checkBox>, and many
documents leave the field's cached result empty — the dump captures that
faithfully as text="". On replay the unconditional glyph turned every such
empty checkbox into a literal ☐ run (113 of them in a medical-device
questionnaire), so the rebuilt text no longer matched the source and the
glyph's font metrics — different from the ffData-rendered box — nudged
form/table layout.

Only synthesize the default glyph for a typed `add formfield type=checkbox`
that supplies no explicit result (no text/value key). When the dump passes
an explicit result — the cached glyph when the source stored one, or "" when
it did not — honor it. An empty-result checkbox now round-trips empty, a
cached or checked checkbox round-trips its glyph, and a fresh typed add
still gets a default ☐ / ☒.
…set+get), matching docx

The pptx table-cell help schema split a single semantic value — a cell's
horizontal span — across two properties: `colspan` (declared read-only/get) and
`gridSpan` (declared settable). That violates "one canonical key per value" and
diverges from docx, which models it as one `colspan` (set+get, alias `gridspan`).
The visible symptom: set a span via `gridSpan` and `get` returned it under
`colspan`, so a schema-driven reader looking for `gridSpan` on get found nothing,
even though the value round-tripped.

The handler was already docx-consistent (accepts `gridspan`/`colspan` on set,
emits `colspan` on get) — only the schema was wrong. Merge the two declarations
into a single `colspan` (set+get, alias `gridspan`), dropping the standalone
`gridSpan`. `rowspan` stays get-only (pptx sets vertical merge via `merge.down`).
… width

Chart/shape/picture overlays are absolutely positioned into a per-anchor
box on top of the sheet grid. Two issues kept charts from matching Excel:

1. Fill — the card was sized from its width alone (the inner SVG used
   height:auto), so the plot never grew into the height left below the
   title, leaving an empty gap under the chart. Make the card a flex
   column (height:100%) so the plot fills the box; for a left/right
   legend, shrink the plot viewBox to the real plot area so the meet-fit
   does not letterbox.

2. Width — the box summed whole spanned columns but dropped the
   partial-column EMU offset, and EstimateChartSize fell back to 48pt
   (64px) for default columns instead of the grid's ~44.27pt (59px),
   clamping the card ~one column narrower than Excel. Feed each chart its
   EstimateChartSize width (offset included, grid-aligned metric) as the
   overlay box width; shapes/pictures pass 0 and keep the column-sum.

Validation: render a multi-chart .xlsx and compare to Excel — a 480px
chart now renders ~444px ending mid-column-N (was clamped a column short),
and every gallery chart aligns to its anchor row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eight

Rows without an explicit height got no height attribute, so the grid let
them shrink to content — empty rows rendered ~17px and 11pt data rows
~22px, while Excel renders default rows at 15pt (~20px). That distorts
the grid vertically and drifts it out of step with the chart overlay's
anchor math, which assumes the sheet default row height.

Give every row the explicit-or-default height, and trim the cell's
vertical padding (2px -> 1px) so default 11pt rows land on 15pt instead
of overgrowing. Rows with an explicit/auto-fit height (rotated text,
large fonts) keep their value via the existing RowHeights path.

Validation: render any sheet — empty rows are now 20px (were 17px), data
rows 20px (were 22px), and an absolutely-positioned chart's anchor row
lines up with the matching grid row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ow low

.chart-container carried `margin: 16px auto` from an older in-flow layout.
Charts are now absolutely positioned inside a per-anchor box, so that top
margin pushed every visible card down ~16px — landing it at the bottom of
its anchor row instead of the top — and made it overflow the box bottom.
Set margin: 0 so the card sits exactly on its cell anchor.

Validation: a chart anchored at row 3 now starts at row 3's top edge (was
row 3's bottom) and ends on its bottom anchor row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dragonwhites dragonwhites force-pushed the fix/excel-html-chart-overlays branch from 1dc5843 to 2bbf8cd Compare June 16, 2026 07:01
@dragonwhites

dragonwhites commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main to clear the conflict.

The two conflict hunks sat right next to your recent changes, so I kept yours and layered the chart fixes on top:

  • your CssHexColor legend handling stays as-is;
  • your picture-span sizing (5b69619) stays, with the chart widthPtHint override running after the picture branch — pictures/shapes pass 0, so it only affects charts.

Re-verified all four fixes on a 12-chart gallery: charts fill their box at the correct width (~444px), sit exactly on their anchor rows, and grid rows no longer collapse below 15pt. Scatter is already handled by your own fix (cb7548b), which the rebase picked up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants